Skip to content

[TRTLLM-13246][feat] Wave 1: migrate aliases to setup_aliases and stage GMS RO load#15014

Open
chienchunhung wants to merge 1 commit into
NVIDIA:mainfrom
chienchunhung:feat/staged-hooks-wave1-gms-ro
Open

[TRTLLM-13246][feat] Wave 1: migrate aliases to setup_aliases and stage GMS RO load#15014
chienchunhung wants to merge 1 commit into
NVIDIA:mainfrom
chienchunhung:feat/staged-hooks-wave1-gms-ro

Conversation

@chienchunhung

@chienchunhung chienchunhung commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

Wave 1 of the staged post-load hooks rollout. The staged-hook contract landed in #14770, and #14878 has now merged, so this PR is a single Wave 1 commit on top of main.

This change migrates alias-only model hooks from post_load_weights() to setup_aliases() and cuts the GMS read-only (RO) load path over from the old meta-tensor workaround to the staged-hook protocol.

What Changed

  • Alias migration for 7 top-level model classes: modeling_llama (LlamaForCausalLM, Llama4ForConditionalGeneration), modeling_deepseekv3, modeling_glm, modeling_exaone_moe, modeling_qwen3_moe, modeling_qwen3_next, and modeling_gpt_oss. Their alias-only post_load_weights() bodies move verbatim into setup_aliases(), while standard load paths continue through the base post_load_weights() orchestrator.
  • GMS RO ordering now runs staged hooks around zero-copy materialization:
post_load_apply
  -> _setup_aliases(model)                  # recursive alias walk
  -> _check_gms_source_identity(gms_backend) # STRICT pre-materialize gate from #14878
  -> materialize_module(model)
  -> _walk_cache_state(model)
  -> post_load_publish
  • GMS docs now describe alias setup before materialization and derived-state refresh after real tensors are bound.
  • Full reload now resets existing _weights_transformed flags before rebinding fresh weights, while partial reload keeps existing transform guards intact for untouched modules.
  • Tests cover the staged walkers, reload reset, and GMS RO ordering in test_model_loader_gms.py and test_model_loader_mx.py.

Dependency / prerequisite stack

This PR is Wave 1 in the staged post-load hooks rollout. The foundation PRs #14770 and #14878 are already merged. The wave PRs should merge in sequence; after each upstream wave lands, rebase the next wave onto main so review and CI focus on that wave's delta.

Arrows point from prerequisite to dependent. PR numbers in graph nodes are clickable.

graph TD
    PR14770["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14770'>#14770</a>: staged-hook contract (merged)"]
    PR14878["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14878'>#14878</a>: GMS SourceIdentity gate (merged)"]
    PR15014["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15014'>#15014</a>: Wave 1 aliases + GMS RO load (this PR, open)"]
    PR15288["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15288'>#15288</a>: Wave 2 Linear/Attention transforms (draft)"]
    PR15386["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15386'>#15386</a>: Wave 3 MoE/Mamba staged hooks (draft)"]
    PR15387["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15387'>#15387</a>: Wave 4 MX receiver cutover (draft)"]
    PR15432["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15432'>#15432</a>: Wave 5 MX publisher + Llama receiver (draft)"]
    VERIFY["post-migration verification / demo (planned)"]

    PR14770 -->|satisfied| PR15014
    PR14878 -->|satisfied| PR15014
    PR15014 -->|blocking| PR15288
    PR15288 -->|blocking| PR15386
    PR15386 -->|blocking| PR15387
    PR15387 -->|blocking| PR15432
    PR15432 -.->|planned| VERIFY

    classDef merged fill:#dcfce7,stroke:#16a34a,color:#14532d;
    classDef inflight fill:#dbeafe,stroke:#2563eb,color:#1e3a8a;
    classDef draft fill:#ffedd5,stroke:#f97316,color:#7c2d12;
    classDef current fill:#ede9fe,stroke:#7c3aed,color:#3b0764,stroke-width:3px;
    classDef downstream fill:#f3f4f6,stroke:#6b7280,color:#374151,stroke-dasharray:5 5;
    linkStyle 0,1 stroke:#16a34a,stroke-width:2px;
    linkStyle 2,3,4,5 stroke:#ea580c,stroke-width:3px;
    linkStyle 6 stroke:#6b7280,stroke-width:2px,stroke-dasharray:5 5;

    class PR14770,PR14878 merged;
    class PR15288,PR15386,PR15387,PR15432 draft;
    class PR15014 current;
    class VERIFY downstream;
Loading

Immediate merge dependency for this PR: none beyond the already-merged #14770 and #14878 foundation PRs; downstream waves remain stacked on this branch until Wave 1 lands.

Test Plan

  • pytest tests/unittest/_torch/pyexecutor/test_model_loader_gms.py tests/unittest/_torch/pyexecutor/test_model_loader_mx.py
  • Full L0 CI with /bot run --disable-fail-fast before review.
  • Local checks completed: git diff --check, py_compile on changed Python files, and pre-commit. Local pytest was not available in this macOS shell.

Next Steps

  • Wave 2: migrate Linear/Attention transforms into transform_weights() with _weights_transformed guards.
  • Wave 3: migrate MoE and Mamba transforms.
  • Wave 4: MX publish-after-transform flip, receiver cutover, and per-model allow-list.

PR Checklist

  • PR description clearly explains what and why.
  • Follows TRT-LLM coding guidelines to the best of my knowledge.
  • Test cases are provided for new code paths.
  • No public API changes.
  • No new dependencies.

Summary by CodeRabbit

  • Refactor
    • Restructured internal model initialization and layer alias resolution for improved stability during weight loading.
    • Optimized GPU memory management during model setup and state caching.
    • Updated model loading sequence for read-only GPU memory paths to ensure correct operation ordering.

@chienchunhung chienchunhung changed the title [TRTLLM-13077][feat] Wave 1: migrate aliases to setup_aliases and stage GMS RO load [TRTLLM-13246][feat] Wave 1: migrate aliases to setup_aliases and stage GMS RO load Jun 5, 2026
@chienchunhung chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from 6456b20 to ac30c0a Compare June 5, 2026 20:31
@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #52438 [ run ] triggered by Bot. Commit: ac30c0a Link to invocation

@chienchunhung chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch 2 times, most recently from 690c0c8 to ac30c0a Compare June 5, 2026 22:16
@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #52445 [ run ] triggered by Bot. Commit: ac30c0a Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #52438 [ run ] completed with state ABORTED. Commit: ac30c0a

Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #52445 [ run ] completed with state FAILURE. Commit: ac30c0a
/LLM/main/L0_MergeRequest_PR pipeline #41738 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from ac30c0a to eabb7c0 Compare June 8, 2026 23:53
@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #52887 [ run ] triggered by Bot. Commit: eabb7c0 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #52887 [ run ] completed with state FAILURE. Commit: eabb7c0
/LLM/main/L0_MergeRequest_PR pipeline #42137 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from eabb7c0 to c42781c Compare June 11, 2026 21:22

Copy link
Copy Markdown
Collaborator Author

/bot run

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #53683 [ run ] triggered by Bot. Commit: c42781c Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #53687 [ run ] triggered by Bot. Commit: c42781c Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #53683 [ run ] completed with state ABORTED. Commit: c42781c

Link to invocation

@chienchunhung chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from c42781c to 4352612 Compare June 12, 2026 00:28
@chienchunhung chienchunhung marked this pull request as ready for review June 12, 2026 00:32
@chienchunhung chienchunhung requested review from a team as code owners June 12, 2026 00:32
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54636 [ run ] completed with state SUCCESS. Commit: 85973b0
/LLM/main/L0_MergeRequest_PR pipeline #43668 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54668 [ run ] triggered by Bot. Commit: 85973b0 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54668 [ run ] completed with state SUCCESS. Commit: 85973b0
/LLM/main/L0_MergeRequest_PR pipeline #43700 completed with status: 'SUCCESS'

CI Report

Link to invocation

@brb-nv brb-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to model files under tensorrt_llm/_torch/models/ look good to me.

@chienchunhung chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from 85973b0 to 7eec9fe Compare June 23, 2026 05:06
@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55160 [ run ] triggered by Bot. Commit: 7eec9fe Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55160 [ run ] completed with state SUCCESS. Commit: 7eec9fe
/LLM/main/L0_MergeRequest_PR pipeline #44135 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from 7eec9fe to a260f3b Compare June 23, 2026 16:53
@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55281 [ run ] triggered by Bot. Commit: a260f3b Link to invocation

…ge GMS RO load

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
@chienchunhung chienchunhung requested a review from litaotju June 23, 2026 17:03
@chienchunhung chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from a260f3b to a21d821 Compare June 23, 2026 17:03

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55287 [ run ] triggered by Bot. Commit: a21d821 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55281 [ run ] completed with state ABORTED. Commit: a260f3b

Link to invocation

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55303 [ run ] triggered by Bot. Commit: a21d821 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55287 [ run ] completed with state ABORTED. Commit: a21d821

Link to invocation

@chienchunhung chienchunhung requested a review from QiJune June 23, 2026 17:59

@mikeiovine mikeiovine left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing off on PyExecutor related changes

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55303 [ run ] completed with state FAILURE. Commit: a21d821
/LLM/main/L0_MergeRequest_PR pipeline #44253 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --stage-list "DGX_B300-4_GPUs-PyTorch-1, GB200-4_GPUs-PyTorch-5"

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55345 [ run ] triggered by Bot. Commit: a21d821 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55345 [ run ] completed with state SUCCESS. Commit: a21d821
/LLM/main/L0_MergeRequest_PR pipeline #44294 (Partly Tested) completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

2 similar comments
@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants